From raw images of the lips to articulatory parameters: a viseme-based prediction
نویسنده
چکیده
This paper presents a method for the extraction of articulatory parameters from direct processing of raw images of the lips. The system architecture is made of three independent parts. First, a new greyscale mouth image is centred and downsampled. Second, the image is aligned and projected onto a basis of artificial images. These images are the eigenvectors computed from a PCA applied on a set of 23 reference lip shapes. Then, a multilinear interpolation predicts articulatory parameters from the image projection coefficients onto the eigenvectors. In addition, the projection coefficients and the predicted parameters were evaluated by an HMMbased visual speech recogniser. Recognition scores obtained with our method are compared to reference scores and discussed.
منابع مشابه
Towards an Audiovisual Virtual Talking Head: 3d Articulatory Modeling of Tongue, Lips and Face Based on Mri and Video Images
A linear three-dimensional articulatory model of tongue, lips and face is presented. The model is based on a linear component analysis of the 3D coordinates defining the geometry of the different organs, obtained from Magnetic Resonance Imaging of the tongue, and from front and profile video images of the subject’s face marked with small beads. In addition to a common jaw height parameter, the ...
متن کاملCreating and controlling video-realistic talking heads
We present a linear three-dimensional modeling paradigm for lips and face, that captures the audiovisual speech activity of a given speaker by only six parameters. Our articulatory models are constructed from real data (front and profile images), using a linear component analysis of about 200 3D coordinates of fleshpoints on the subject's face and lips. Compared to a raw component analysis, our...
متن کاملPhoto-realistic visual speech synthesis based on AAM features and an articulatory DBN model with constrained asynchrony
This paper presents a photo realistic visual speech synthesis method based on an audio visual articulatory dynamic Bayesian network model (AF_AVDBN) in which the maximum asynchronies between the articulatory features, such as lips, tongue and glottis/velum, can be controlled. Perceptual linear prediction (PLP) features from the audio speech and active appearance model (AAM) features from mouth ...
متن کاملAn Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging
Ultrasound imaging of the tongue and videos of lips movements can be used to investigate specific articulation in speech or singing voice. In this study, tongue and lips image sequences recorded during singing performance are used to predict vocal tract properties via Line Spectral Frequencies (LSF). We focused our work on traditional Corsican singing “Cantu in paghjella”. A multimodal Deep Aut...
متن کاملInner lips feature extraction based on CLNF with hybrid dynamic template for Cued Speech
In previous French Cued Speech (CS) studies, one of the widely used methods is painting blue color on the speaker’s lips to make lips feature extraction easier. In this paper, in order to get rid of this artifice, a novel automatic method to extract the inner lips contour of CS speakers is presented. This method is based on a recent facial contour extraction model developed in computer vision, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997